Prostate cancer associated circulating nucleic acid biomarkers

ABSTRACT

The invention provides methods and reagents for diagnosing prostate cancer that are based on the detection of biomarkers in the circulating nucleic acids from a patient to be evaluated.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional application of U.S. application Ser.No. 14/414,882, filed on Jan. 14, 2015, which is a National Stage ofInternational Application No. PCT/US2012/068489, filed Dec. 7, 2012, andwhich claims priority benefit of U.S. provisional application No.61/568,065, filed Dec. 7, 2011, which are herein incorporated byreference for all purposes.

BACKGROUND OF THE INVENTION

Methods to detect prostate cancer, including PSA tests, are extremelyunreliable (see, e.g., Wever et al., J Natl Cancer Inst 2010;102:352-355, 2010; Schroder et al., N. Engl. J. Med 360:1320-1328,2009). There is a need for effective detection methods. This inventionaddresses that need.

BRIEF SUMMARY OF THE INVENTION

The invention is based, in part, on the discovery of cell-freecirculating nucleic acids (CNA) biomarkers associated with prostatecancer. In some embodiments, the CNA biomarkers are nucleic acidsequences, in the current invention DNA sequences, that are present inthe blood, e.g., in a serum or plasma sample, of a prostate cancerpatient, but are rarely present, if at all, in the blood, e.g., a serumor plasma sample, obtained from a normal individual, i.e., in thecontext of this invention, an individual that does not have prostatecancer. In some embodiments, the CNA biomarkers are nucleic acidsequences, in the current invention DNA sequences, i.e., DNA fragments,that are present in the blood, e.g., in a serum or plasma sample, of anormal individual, but are rarely present, if at all, in the blood,e.g., a serum or plasma sample, obtained from a prostate cancer patient.

Accordingly, in one aspect, the invention provides a method of analyzingCNA in a sample (blood, serum or plasma) from a patient comprisingdetecting the presence of at least one cell-free DNA having a nucleotidesequence falling within a chromosomal region set forth in Table 1 orTable 4 in the sample. In some embodiments, detecting the level of theat least one biomarker comprises detecting a cell-free DNA moleculehaving between at least 20 to at least 500 consecutive nucleotides, or,e.g., between at least 50 and at least 400 consecutive nucleotides of aunique sequence within a chromosomal region as set forth in Table 1. Insome embodiments, the chromosomal regions is set forth in Table 4.

In one embodiment, a method of analyzing circulating free DNA in apatient sample is provided, comprising determining, in a sample that isblood, serum or plasma, the presence or absence, or the amount of, atleast 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 45, 50, 55, 60, 65, 70, 75,80, 85, 90, 95 or 100 cell-free DNA molecules each having a sequencefalling within a different chromosomal region set forth in Table 1 orTable 4, and preferably the sequences of the cell-free DNA molecules arefree of repetitive element. In preferred embodiments, the cell-free DNAmolecules have sequences falling within different chromosomal regions inthe same table selected from Table 1 or Table 4.

In another aspect, the present invention provides a kit including two ormore (e.g., at least 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 45, 50, 55,60, 65, 70, 75, 80, 85, 90, 95 or 100) sets of oligonucleotides. In someembodiments, the kit includes 100 or fewer sets of oligonucleotides.Each set comprises one or more oligonucleotides with a nucleotidesequence falling within one single chromosomal region that is set forthin Table 1 or Table 4. Preferably, different oligonucleotide setscorrespond to different chromosomal regions within the same tableselected from Table 1 or Table 4. Also, preferably the oligonucleotidesare free of repetitive element. Optionally, the oligonucleotides areattached to one or more solid substrates such as microchips and beads.

In another aspect, the present invention provides a method of diagnosingor screening for prostate cancer in a patient. The method includes thesteps of: (a) determining, in a sample that is blood, serum or plasmafrom a patient, the presence or absence or the amount of, at least 2, 3,4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,24, 25, 26, 27, 28, 29, 30, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90,95 or 100 cell-free DNA molecules each having a sequence falling withina different chromosomal region set forth in Table 1 or Table 4, and (b)correlating the presence of, or an increased amount of, said first andsecond cell-free DNAs with an increased likelihood that the patient hasprostate cancer. Preferably, the sequences of the cell-free DNAmolecules are free of repetitive elements. In preferred embodiments, thecell-free DNA molecules have sequences falling within differentchromosomal regions in the same table chosen from Table 1 or Table 4.

In one aspect, the invention provides a method of identifying a patientthat has a CNA biomarker associated with prostate cancer, the methodcomprising detecting an increase in the level, relative to normal, of atleast one biomarker designated as “UP” in Table 1 or Table 4, in a CNAsample obtained from serum or plasma from the patient. A biomarker canbe identified using any number of methods, including sequencing of CNAas well as use of a probe or probe set to detect the presence of thebiomarker.

In some embodiments, the invention provides a method of identifying apatient that has a CNA biomarker associated with prostate cancer, themethod comprising detecting a decrease in the level, relative to normal,of at least one biomarker designated as “DOWN” in Table 1 or Table 4 inCNA sample from serum or plasma from the patient. A biomarker can beidentified using any number of methods, including sequencing of CNA aswell as use of a probe or probe set to detect the presence of thebiomarker.

In a further aspect, the invention provides a kit for identifying apatient that has a biomarker for prostate cancer and/or that has abiomarker associated with a normal individual that does not haveprostate cancer, wherein the kit comprises at least one polynucleotideprobe to a biomarker set forth in Table 1 or Table 4. Preferably, such akit comprises probes to multiple biomarkers, e.g., at least 2, 3, 4, 5,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or100, of the biomarkers set forth in Table 1 or Table 4. In someembodiments, the cell-free DNA molecules have a sequence falling withina different chromosomal region set forth in Table 1 or Table 4. In someembodiments, the kit also includes an electronic device or computersoftware to compare the hybridization patterns of the CNA in the patientsample to a prostate cancer data set comprising a listing of biomarkersthat are present in prostate cancer patient CNA, but not CNA samplesfrom normal individuals.

In some embodiments, the presence of the at least one biomarker in CNAis determined by sequencing. In some embodiments, the presence of the atleast one biomarker in CNA is determined using an array. In someembodiments, the presence of the at least one biomarker in CNA isdetermined using an assay that comprises an amplification reaction, suchas a polymerase chain reaction (PCR). In some embodiments, a nucleicacid array forming a probe set comprising probes to two or morechromosomal regions set forth in Table 1 or Table 4 is employed. In someembodiments, a nucleic acid array forming a probe set comprising 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, or 27 of the chromosomal regions set forth in Table 4 isemployed. In some embodiments, a nucleic acid array forming a probe setcomprising 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50,55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 of the chromosomal regionsset forth in Table 1 is employed.

In an additional aspect, the invention provides a method of detectingprostate cancer in a patient that has, or is suspected of having,prostate cancer, the method comprising contacting DNA from the serum orplasma sample with a probe that selectively hybridizes to a sequencepresent on a chromosomal region described herein, e.g., a sequence setforth in Table 1 or Table 4 under conditions in which the probeselectively hybridizes to the sequence; and detecting the presence orabsence of hybridization of the probe, wherein the level ofhybridization to the sequence is indicative of prostate cancer.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the flowchart of unsupervised cluster search (UCS)methodology.

FIG. 2 shows a correlation of a chromosomal region biomarker and PSAtest.

FIG. 3 shows a ROC curve using the Copy Number Instability (CNI) scorein circulating nucleic acids (CAN); Z-scores of >2 were summed in eachindividual to generate the score.

FIGS. 4(a) and (b) provide an example showing the CNA copy numbervariations (Z-values) in five normal individuals (a) compared to fiveprostate cancer patients (b). The outer tracks represent the humanchromosomes, chromosomal positions in Mbp are indicated. Each innercircular track represents data for one individual. Significant datapoints with values >2 or <−2 are highlighted by a larger glyph size.Each data track's y-axis spans from −20 to 20, the two sub-scalesindicate values of −10 and 10.

DETAILED DESCRIPTION OF THE INVENTION

As used herein, a “biomarker” refers to a nucleic acid sequence thatcorresponds to a chromosomal region, where the level of the nucleic acidin CNA relative to normal is associated with prostate cancer. In someembodiments, in which a biomarker is indicated as “UP” in Table 1 orTable 4, the level in CNA of a prostate cancer patient is increasedrelative to normal. In some embodiments, in which a biomarker isindicated as “DOWN” in Table 1 or Table 4, the level in CNA of aprostate cancer patient is decreased relative to normal.

In the current invention, a “chromosomal region” listed in Table 1 orTable 4 refers to the region of the chromosome that corresponds to thenucleotide positions indicated in the tables. The nucleotide positionson the chromosomes are numbered according to Homo sapiens (human)genome, hg18/build 36.1 genome version release March 2006. As understoodin the art, there are naturally occurring polymorphisms in the genome ofindividuals. Thus, each chromosome region listed in Table 1 or Table 4encompasses allelic variants as well as the particular sequence in thedatabase. An allelic variant typically has at least 95% identity, oftenat least 96%, at least 97%, at least 98%, or at least 99% identity tothe sequence of a chromosomal region that is present in a particulardatabase, e.g., the National Center for Biotechnology Information (Homosapiens Build 36.1 at the website addresswww.ncbi.nlm.nih.gov/mapview/.) Percent identity can be determined usingwell known algorithms, including the BLAST algorithm, e.g., set to thedefault parameters. Further, it is understood that the nucleotidesequences of the chromosomes may be improved upon as errors in thecurrent database are discovered and corrected. The term “chromosomalregion” encompasses any variant or corrected version of the same regionas defined in Table 1 or Table 4. Given the information provided inTable 1 or Table 4 in the present disclosure and the available genomedatabases, a skilled person in the art will be able to understand thechromosomal regions used for the present invention even after newvariants are discovered or errors are corrected.

“Detecting a chromosomal region” in CNA in the context of this inventionrefers to detecting the level of any sequence from a chromosomal regionshown in Table 1 or Table 4, where the sequence detected can be assignedunambiguously to that chromosomal region. Thus, this term refers to thedetection of unique sequences from the chromosomal regions. In thecurrent invention, the level of at least one region, typically multipleregions used in combination, in a CNA sample is compared to the rangefound for such region in a group of “normal” individuals, i.e., in thecontext of this invention, individuals who do not have cancer or atleast have not been diagnosed with cancer. For regions that areincreased in level in prostate cancer patients, i.e., regions listed asUP in Table 1 or Table 4, a result is typically considered to beincreased if the result for the sample is higher than the 60th, 70th,75th, 80th, 85th, 90th, 95th, or 99th percentile. For regions that aredecreased in level in prostate cancer patients, i.e., regions listed asDOWN in Table 1 or Table 4, a result is typically considered to bedecreased if the result for the sample is below the 40th, 30th, 25th,20th, 15th, 10th, 5th, or 1st percentile in normal individuals. Methodsof removing repetitive sequences from the analysis are known in the artand include use of blocking DNA, e.g., when the target nucleic acids areidentified by hybridization. In some embodiments, typically where thepresence of a prostate cancer biomarker is determined by sequencing theCNA from a patient, well known computer programs and manipulations canbe used to remove repetitive sequences from the analysis (see, e.g., theEXAMPLES section). In addition, sequences that have multiple equallyfitting alignment to the reference database are typically omitted fromfurther analyses.

The term “detecting a biomarker” as used herein refers to detecting apolynucleotide, e.g., DNA, from a chromosomal region listed in Table 1or Table 4 in CNA. As used herein, “detecting the level” of a biomarkerencompasses quantitative measurements as well as detecting the presence,or absence, of the biomarker. Thus, e.g., the term “detecting anincrease in the level of” a biomarker, relative to normal, includesqualitative embodiments in which the biomarker is detected in a patientsample, but not a normal sample. Similarly, the term “detecting adecrease in the level of” a biomarker, relative to normal, includesembodiments in which the biomarker is not detected in a patient sample,but is detected in normal samples. A biomarker is considered to be“present” if any nucleic acid sequence in the CNA is unambiguouslyassigned to the chromosomal region.

The term “unambiguously assigned” in the context of this inventionrefers to determining that a DNA detected in the CNA of a patient isfrom a particular chromosomal region. Thus, in detection methods thatemploy hybridization, the probe hybridizes specifically to that region.In detection methods that employ amplification, the primer(s) hybridizesspecifically to that region. In detection methods that employsequencing, the sequence is assigned to that region based on well-knownalgorithms for identity, such as the BLAST algorithm using highstringent parameters, such as e<0.0001. In addition, such a sequencedoes not have a further equally fitting hit on the used database.

The term “circulating nucleic acids” or “CNA” refers to cell-freenucleic acids, i.e., that are not contained with any intact cells inhuman blood, that are present in the blood.

The term “circulating cell-free DNA” as used herein means free DNAmolecules of 25 nucleotides or longer that are not contained within anyintact cells in human blood, and can be obtained from human serum orplasma.

The term “hybridization” refers to the formation of a duplex structureby two single stranded nucleic acids due to complementary base pairing.Hybridization can occur between exactly complementary nucleic acidstrands or between nucleic acid strands that contain minor regions ofmismatch. As used herein, the term “substantially complementary” refersto sequences that are complementary except for minor regions ofmismatch. Typically, the total number of mismatched nucleotides over ahybridizing region is not more than 3 nucleotides for sequences about 15nucleotides in length. Conditions under which only exactly complementarynucleic acid strands will hybridize are referred to as “stringent” or“sequence-specific” hybridization conditions. Stable duplexes ofsubstantially complementary nucleic acids can be achieved under lessstringent hybridization conditions. Those skilled in the art of nucleicacid technology can determine duplex stability empirically considering anumber of variables including, for example, the length and base pairconcentration of the oligonucleotides, ionic strength, and incidence ofmismatched base pairs. For example, computer software for calculatingduplex stability is commercially available from National Biosciences,Inc. (Plymouth, Minn.); e.g., OLIGO version 5, or from DNA Software (AnnArbor, Mich.), e.g., Visual OMP 6.

Stringent, sequence-specific hybridization conditions, under which anoligonucleotide will hybridize only to the target sequence, are wellknown in the art (see, e.g., the general references provided in thesection on detecting polymorphisms in nucleic acid sequences). Stringentconditions are sequence-dependent and will be different in differentcircumstances. Generally, stringent conditions are selected to be about5° C. lower to 5° C. higher than the thermal melting point (Tm) for thespecific sequence at a defined ionic strength and pH. The Tm is thetemperature (under defined ionic strength and pH) at which 50% of theduplex strands have dissociated. Relaxing the stringency of thehybridizing conditions will allow sequence mismatches to be tolerated;the degree of mismatch tolerated can be controlled by suitableadjustment of the hybridization conditions.

The term “primer” refers to an oligonucleotide that acts as a point ofinitiation of DNA synthesis under conditions in which synthesis of aprimer extension product complementary to a nucleic acid strand isinduced, i.e., in the presence of four different nucleosidetriphosphates and an agent for polymerization (i.e., DNA polymerase orreverse transcriptase) in an appropriate buffer and at a suitabletemperature. A primer is preferably a single-strandedoligodeoxyribonucleotide. The primer includes a “hybridizing region”exactly or substantially complementary to the target sequence,preferably about 15 to about 35 nucleotides in length. A primeroligonucleotide can either consist entirely of the hybridizing region orcan contain additional features which allow for the detection,immobilization, or manipulation of the amplified product, but which donot alter the ability of the primer to serve as a starting reagent forDNA synthesis. For example, a nucleic acid sequence tail can be includedat the 5′ end of the primer that hybridizes to a captureoligonucleotide.

The term “probe” refers to an oligonucleotide that selectivelyhybridizes to a target nucleic acid under suitable conditions. A probefor detection of the biomarker sequences described herein can be anylength, e.g., from 15-500 bp in length. Typically, in probe-basedassays, hybridization probes that are less than 50 bp are preferred.

The term “target sequence” or “target region” refers to a region of anucleic acid that is to be analyzed and comprises the sequence ofinterest.

As used herein, the terms “nucleic acid,” “polynucleotide” and“oligonucleotide” refer to primers, probes, and oligomer fragments. Theterms are not limited by length and are generic to linear polymers ofpolydeoxyribonucleotides (containing 2-deoxy-D-ribose),polyribonucleotides (containing D-ribose), and any other N-glycoside ofa purine or pyrimidine base, or modified purine or pyrimidine bases.These terms include double- and single-stranded DNA, as well as double-and single-stranded RNA. Oligonucleotides for use in the invention maybe used as primers and/or probes.

A nucleic acid, polynucleotide or oligonucleotide can comprisephosphodiester linkages or modified linkages including, but not limitedto phosphotriester, phosphoramidate, siloxane, carbonate,carboxymethylester, acetamidate, carbamate, thioether, bridgedphosphoramidate, bridged methylene phosphonate, phosphorothioate,methylphosphonate, phosphorodithioate, bridged phosphorothioate orsulfone linkages, and combinations of such linkages.

A nucleic acid, polynucleotide or oligonucleotide can comprise the fivebiologically occurring bases (adenine, guanine, thymine, cytosine anduracil) and/or bases other than the five biologically occurring bases.These bases may serve a number of purposes, e.g., to stabilize ordestabilize hybridization; to promote or inhibit probe degradation; oras attachment points for detectable moieties or quencher moieties. Forexample, a polynucleotide of the invention can contain one or moremodified, non-standard, or derivatized base moieties, including, but notlimited to, N6-methyl-adenine, N6-tert-butyl-benzyl-adenine, imidazole,substituted imidazoles, 5-fluorouracil, 5 bromouracil, 5-chlorouracil,5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5(carboxyhydroxymethyl)uracil, 5 carboxymethylaminomethyl-2-thiouridine,5 carboxymethylaminomethyluracil, dihydrouracil,beta-D-galactosylqueosine, inosine, N6 isopentenyladenine,1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine,2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-methyladenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxyaminomethyl-2-thiouracil, beta-D mannosylqueosine,5′-methoxycarboxymethyluracil, 5-methoxyuracil,2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v),wybutoxosine, pseudouracil, queosine, 2 thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acidmethylester, 3-(3-amino-3-N-2-carboxypropyl)uracil, (acp3)w, 2,6-diaminopurine, and 5-propynyl pyrimidine. Otherexamples of modified, non-standard, or derivatized base moieties may befound in U.S. Pat. Nos. 6,001,611; 5,955,589; 5,844,106; 5,789,562;5,750,343; 5,728,525; and 5,679,785, each of which is incorporatedherein by reference in its entirety. Furthermore, a nucleic acid,polynucleotide or oligonucleotide can comprise one or more modifiedsugar moieties including, but not limited to, arabinose,2-fluoroarabinose, xylulose, and a hexose.

The term “repetitive element” as used herein refers to a stretch of DNAsequence of at least 25 nucleotides in length that is present in thehuman genome in at least 50 copies.

The terms “arrays,” “microarrays,” and “DNA chips” are used hereininterchangeably to refer to an array of distinct polynucleotides affixedto a substrate, such as glass, plastic, paper, nylon or other type ofmembrane, filter, chip, bead, or any other suitable solid support. Thepolynucleotides can be synthesized directly on the substrate, orsynthesized separate from the substrate and then affixed to thesubstrate. The arrays are prepared using known methods.

Introduction

The invention is based, at least in part, on the identification CNAsequences from particular chromosomal regions that are present or at anincreased amount in the blood of patients that have prostate cancer, butare rarely, if ever, present, or at a lower amount, in the blood ofnormal patients that do not have prostate cancer. The invention is alsobased, in part, on the identification of biomarkers in the CNA in normalindividuals, i.e., in the context of this invention, individuals notdiagnosed with prostate cancer, that are rarely, if ever, present inpatients with prostate cancer. Thus, the invention provides methods anddevices for analyzing for the presence of sequences from a chromosomalregion corresponding to at least one of the chromosomal regions setforth in Table 1 or Table 4.

Accordingly, in one aspect, the invention provides a method of analyzingCNA in a sample (blood, serum or plasma) from a patient comprisingdetecting the presence of, or an amount of, at least one circulatingcell-free DNA having a nucleotide sequence of at least 25 nucleotidesfalling within a chromosomal region set forth in Table 1. In someembodiments, the invention provides a method of analyzing CNA in asample (blood, serum or plasma) from a patient comprising detecting thepresence of, or an amount of, at least one circulating cell-free DNAhaving a nucleotide sequence of at least 25 nucleotides falling within achromosomal region set forth in Table 4. Preferably, the circulatingcell-free DNA is free of repetitive element. In one embodiment, thepatient is an individual suspected of or diagnosed with cancer, e.g.,prostate cancer.

By “falling within” it is meant herein that the nucleotide sequence of acirculating cell-free DNA is substantially identical (e.g., greater than95% identical) to a part of the nucleotide sequence of a chromosomeregion. In other words, the circulating cell-free DNA can hybridize tounder stringent conditions, or be derived from, the chromosomal region.

In one embodiment, a method of analyzing circulating cell-free DNA in apatient sample is provided, comprising determining, in a sample that isblood, serum or plasma, the presence or the amount of, a plurality ofcirculating cell-free DNA molecules each having a sequence of at least25 nucleotides in length, or at least 40, 50, 60, 75, or 100 or moreconsecutive nucleotides falling within the same one single chromosomalregion set forth in Table 1 or Table 4. There may be two or more or anynumber of different circulating cell-free DNA molecules that are allderived from the same one chromosomal region set forth in Table 1 orTable 4, and in some embodiments, all such circulating cell-free DNAmolecules are detected and/or the amounts thereof are determined.

Preferably the sequences of the circulating cell-free DNA molecules arefree of repetitive elements.

In one embodiment, a method of analyzing circulating cell-free DNA in apatient sample is provided, comprising determining, in a sample that isblood, serum or plasma, the presence or absence or the amount of, atleast 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 45, 50, 55, 60, 65, 70, 75,80, 85, 90, 95 or 100 circulating cell-free DNA molecules each having asequence of at least 25 consecutive nucleotides, or at least 40, 50 60,75, or 100, or more consecutive nucleotides falling within a differentchromosomal region set forth in Table 1. In some embodiments, thecell-free DNA molecules have a sequence falling within a differentchromosomal region set forth in Table 4. Preferably the sequences of thecirculating cell-free DNA molecules are free of repetitive elements. Inpreferred embodiments, the cell free DNA molecules have sequencesfalling within different chromosomal regions in the same table that ischosen from Table 1 or Table 4. In one specific embodiment, the presenceor absence or the amounts of, at least 2, 3, 4, 5, 7, 8, 9, 10, 11, 12,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100, circulatingcell-free DNA molecules are determined, the sequence of each fallingwithin a different chromosomal region set forth in Table 1. In someembodiments, the circulating cell-free DNA molecules have a sequencefalling within a different chromosomal region set forth in Table 4.

In another specific embodiment, the method of analyzing circulatingcell-free DNA includes the steps of: isolating, from blood, serum orplasma sample of a patient, substantially all circulating cell-free DNAmolecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, and contacting the circulating cell-free DNA molecules to aplurality of oligonucleotides (e.g., on a DNA chip or microarray) todetermine if one or more of the circulating cell-free DNA moleculeshybridizes to any one of the plurality of oligonucleotide probes understringent conditions. Each of the oligonucleotide probes has anucleotide sequence identical to a part of the sequence of a chromosomalregion set forth in Table 1. In some embodiments, each of theoligonucleotide probes has a nucleotide sequence identical to a part ofthe sequence of a chromosomal region set forth in Table 4. Thus, if acirculating DNA molecule hybridizes under stringent conditions to one ofthe oligonucleotide probes, it indicates that the circulating DNAmolecule has a nucleotide sequence falling within a chromosomal regionset forth in Table 1 or Table 4, and indicates the presence of thecirculating DNA molecule. The level of the circulating DNA molecule canbe determined by determining the amount of hybridized probe(s).

In the above various embodiments, preferably the circulating cell-freeDNA molecules have at least 25 consecutive nucleotides in length(preferably at least 50, 70, 80, 100, 120 or 200 consecutive nucleotidesin length). More preferably, the circulating cell-free DNA moleculeshave between about 50 and about 300 or 400, preferably from about 75 andabout 300 or 400, more preferably from about 100 to about 200consecutive nucleotides of a unique sequence within a chromosomal regionas set forth in Table 1 or Table 4.

In another aspect, the present invention provides a method of diagnosingor screening for prostate cancer in a patient. The method includes thesteps of: (a) determining, in a sample that is blood, serum or plasmafrom a patient, the level of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15,20, 25, 30, 35, 40, 45, 50, 55, 60, 61, 62, 63, 64, 65, or 66circulating cell-free DNA molecules each having a sequence of at least25 nucleotides in length falling within a different chromosomal regiondesignated as “UP” in Table 1, and (b) correlating the presence of anincreased level of the circulating cell-free DNAs, relative to normal,with an increased likelihood that the patient has prostate cancer.

In another aspect, the method includes the steps of: (a) determining, ina sample that is blood, serum or plasma from a patient, the level of atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 16, 17, 18, 19, or 20,circulating cell-free DNA molecules each having a sequence of at least25 nucleotides in length falling within a different chromosomal regiondesignated as “UP” in Table 4, and (b) correlating the presence of anincreased level of the circulating cell-free DNAs, relative to normal,with an increased likelihood that the patient has prostate cancer.

In another embodiment, the method of invention includes the steps of:(a) determining, in a sample that is blood, serum or plasma from apatient, the level of at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20,25, 30, 31, 32, 33, or 34, circulating cell-free DNA molecules eachhaving a sequence of at least 25 nucleotides in length falling within adifferent chromosomal region designated as “DOWN” in Table 1; and (b)correlating the presence of a decreased level of the circulatingcell-free DNAs, relative to normal, with an increased likelihood thatthe patient has prostate cancer. In some embodiments, the method ofinvention includes the steps of: (a) determining, in a sample that isblood, serum or plasma from a patient, the level of at least 1, 2, 3, 4,5, 6, or 7 circulating cell-free DNA molecules each having a sequence ofat least 25 nucleotides in length falling within a different chromosomalregion designated as “DOWN” in Table 4; and (b) correlating the presenceof a decreased level of the circulating cell-free DNAs, relative tonormal, with an increased likelihood that the patient has prostatecancer.

When the steps of the above methods are applied to a patient diagnosedof cancer, the patient may be monitored for the status of prostatecancer, or for determining the treatment effect of a particulartreatment regimen, or detecting cancer recurrence or relapse.

When the steps of the above methods are applied to a patient diagnosedwith prostate cancer, the patient may be monitored for the status ofprostate cancer, or for determining the treatment effect of a particulartreatment regimen, or detecting cancer recurrence or relapse.

In the diagnosis/monitoring method of the present invention, preferablythe sequences of the circulating cell-free DNA molecules are free ofrepetitive elements. In preferred embodiments, the cell-free DNAmolecules have sequences falling within different chromosomal regions inset forth in Table 1 or Table 4.

In one embodiment, a method of diagnosing prostate cancer in anindividual is provided, comprising (a) determining the levels of atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55,60, 61, 62, 63, 64, 65, or 66, circulating cell-free DNA molecules eachhaving a sequence of at least 25 nucleotides in length falling within adifferent chromosomal region designated as “UP” in Table 1; and (b)correlating the presence of an increased level, relative to normal, ofone or more of the circulating cell-free DNA molecules with an increasedlikelihood that the individual has prostate cancer or a recurrence ofprostate cancer or a failure of treatment for prostate cancer.

In one embodiment, a method of diagnosing/monitoring prostate cancer inan individual is provided, comprising (a) determining the levels of atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 31, 32, 33, or 34circulating cell-free DNA molecules each having a sequence of at least25 nucleotides in length falling within a different chromosomal regiondesignated as “DOWN” in Table 1; and (b) correlating the presence of adecreased level, relative to normal, of one or more of the circulatingcell-free DNA molecules with an increased likelihood that the individualhas prostate cancer or a recurrence of prostate cancer or a failure oftreatment for prostate cancer.

In another embodiment, a method of diagnosing prostate cancer in anindividual is provided, comprising (a) determining the levels of atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 16, 17, 18, 19, or 20circulating cell-free DNA molecules each having a sequence of at least25 nucleotides in length falling within a different chromosomal regiondesignated as “UP” in Table 4; and (b) correlating the presence of anincreased level, relative to normal, of one or more of the circulatingcell-free DNA molecules with an increased likelihood that the individualhas prostate cancer or a recurrence of prostate cancer or a failure oftreatment for prostate cancer.

In another embodiment, a method of diagnosing/monitoring prostate cancerin an individual is provided, comprising (a) determining the levels ofat least 1, 2, 3, 4, 5, 6, or 7 circulating cell-free DNA molecules eachhaving a sequence of at least 25 nucleotides in length falling within adifferent chromosomal region designated as “DOWN” in Table 4; and (b)correlating the presence of a decreased level, relative to normal, ofone or more of the circulating cell-free DNA molecules with an increasedlikelihood that the individual has prostate cancer or a recurrence ofprostate cancer or a failure of treatment for prostate cancer.

In yet another embodiment, the method of diagnosing, monitoring orscreening for prostate cancer in a patient, includes determining, in asample that is blood, serum or plasma from the patient, the level ofeach and all circulating cell-free DNAs, each having a sequence fallingwithin the same one single chromosomal region designated as “UP” inTable 1 or Table 4; and correlating an increased total level of saidcirculating cell-free DNAs, with an increased likelihood that saidpatient has prostate cancer, or recurrence of prostate cancer. In otherwords, there can be any number of, and typically many, differentcirculating cell-free DNA molecules derived from one single samechromosomal region set forth in Table 1 or Table 4, and all of suchdifferent circulating cell-free DNA molecules.

In another embodiment, the method of diagnosing, monitoring or screeningfor prostate cancer in a patient, includes determining, in a sample thatis blood, serum or plasma from the patient, the level of each and allcirculating cell-free DNAs, each having a sequence falling within thesame one single chromosomal region designated as “DOWN” in Table 1 orTable 4; and correlating a decreased level of said circulating cell-freeDNAs with an increased likelihood that said patient has prostate cancer,or recurrence of prostate cancer. In other words, there can be anynumber of, and typically many, different circulating cell-free DNAmolecules derived from one single same chromosomal region set forth inTable 1 or Table4, and all of such different circulating cell-free DNAmolecules are detected and the level determined, and correlation withthe status of prostate cancer is made.

In a specific embodiment, substantially all circulating cell-free DNAmolecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, are isolated from a blood, serum or plasma sample of a patient.The sequence of at least some representative portion of each of theisolated circulating cell-free DNA molecules is determined, and comparedwith one or more of the sequences of the chromosomal regions set forthin Table 1 to determine whether the sequence of a circulating cell-freeDNA falls within a chromosomal region designated as “UP” in Table 1 orTable 4, and the level of the circulating DNA having said sequence. Ifthe level is increased relative to normal, a diagnosis of prostatecancer is made. In the case of a patient treated with a therapy forprostate cancer, recurrence is indicated if an increase, relative tonormal, in the level of a circulating cell-free DNA that falls within achromosomal region designated as “UP” in Table 1 or Table 4 is detected.In preferred embodiments, a diagnosis of prostate cancer or prostatecancer treatment failure or recurrence is indicated if two or morecirculating cell-free DNA molecules that fall within 2, 3, 4, 5, 6, 7,8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 66 or morechromosomal regions designated as “UP” in Table 1 are increased. In morepreferred embodiments, a diagnosis of prostate cancer or prostate cancertreatment failure or recurrence is indicated if two or more circulatingcell-free DNA molecules that fall within 2, 3, 4, 5, 6, 7, 8, 9, 10, 15,20, or more chromosomal regions designated as “UP” in Table 4 areincreased.

In another specific embodiment, substantially all circulating cell-freeDNA molecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, are isolated from a blood, serum or plasma sample of a patient.These circulating cell-free DNA molecules, or a representative portionthereof, are hybridized to a microarray that is described above in thecontext of the kit invention to determine if one of the circulatingcell-free DNA molecules hybridizes to any one of a plurality ofoligonucleotide probes under stringent conditions. Each of theoligonucleotide probes has a nucleotide sequence identical to a part ofthe sequence of a chromosomal region designated as “UP” in Table 1 orTable 4. Thus, if a circulating DNA molecule hybridizes under stringentconditions to one of the oligonucleotide probes, it indicates that thecirculating DNA molecule has a nucleotide sequence falling within achromosomal region set forth in Table 1 or Table 4, and the level isdetermined. If the level is increased, relative to normal, a diagnosisof prostate cancer is made. In the case of a patient treated with atherapy for prostate cancer, recurrence is indicated if there is anincrease in the level of a circulating cell-free DNA falls within achromosomal region designated as “UP” in Table 1 is detected. Inpreferred embodiments, a diagnosis of prostate cancer or prostate cancertreatment failure or recurrence is indicated if two or more circulatingcell-free DNA molecules fall within 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20,25, 30, 35, 40, 45, 50, 55, 60, 65, 66 or more chromosomal regionsdesignated as “UP” in Table 1 are increased. In more preferredembodiments, a diagnosis of prostate cancer or prostate cancer treatmentfailure or recurrence is indicated if two or more circulating cell-freeDNA molecules fall within 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20 or more chromosomal regions designated as “UP” inTable 4 are increased.

In a specific embodiment, substantially all circulating cell-free DNAmolecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, are isolated from a blood, serum or plasma sample of a patient.The sequence of at least some representative portion of each of theisolated circulating cell-free DNA molecules is determined, and comparedwith one or more of the sequences of the chromosomal regions set forthin Table 1 or Table 4 to determine whether the sequence of a circulatingcell-free DNA falls within a chromosomal region designated as “DOWN” inTable 1 or Table 4 and the level of the polynucleotide having saidsequence. If the level is decreased relative to normal, a diagnosis ofprostate cancer is made. In the case of a patient treated with a therapyfor prostate cancer, recurrence is indicated if a decrease, relative tonormal, in the level of a circulating cell-free DNA that falls within achromosomal region designated as “DOWN” in Table 1 or Table 4 isdetected. In preferred embodiments, a diagnosis of prostate cancer orprostate cancer treatment failure or recurrence is indicated if two ormore circulating cell-free DNA molecules that fall within 2, 3, 4, 5, 6,7, 8, 9, 10, or more chromosomal regions designated as “DOWN” in Table 1are decreased. In more preferred embodiments, a diagnosis of prostatecancer or prostate cancer treatment failure or recurrence is indicatedif two or more circulating cell-free DNA molecules that fall within 2,3, 4, 5, 6, 7, or more chromosomal regions designated as “DOWN” in Table4 are decreased.

In another specific embodiment, substantially all circulating cell-freeDNA molecules having a length of at least 20, 25, 30, 40, 50, 75 or 100consecutive nucleotides in length, or between 50 and 400 nucleotides inlength, are isolated from a blood, serum or plasma sample of a patient.These circulating cell-free DNA molecules, or a representative portionthereof, are hybridized to a microarray that is described above in thecontext of the kit invention to determine if one of the circulatingcell-free DNA molecules hybridizes to any one of a plurality ofoligonucleotide probes under stringent conditions. Each of theoligonucleotide probes has a nucleotide sequence identical to a part ofthe sequence of a chromosomal region designated as “DOWN” in Table 1 orTable 4. Thus, if a circulating DNA molecule hybridizes under stringentconditions to one of the oligonucleotide probes, it indicates that thecirculating DNA molecule has a nucleotide sequence falling within achromosomal region set forth in Table 1 or Table 4, and the level isdetermined. If the level is decreased, relative to normal, a diagnosisof prostate cancer is made. In the case of a patient treated with atherapy for prostate cancer, recurrence is indicated if there is adecrease in the level of a circulating cell-free DNA falls within achromosomal region designated as “DOWN” in Table 1 or Table 4 isdetected. In preferred embodiments, a diagnosis of prostate cancer orprostate cancer treatment failure or recurrence is indicated if two ormore circulating cell-free DNA molecules fall within 2, 3, 4, 5, 6, 7,8, 9, 10, or more chromosomal regions designated as “DOWN” in Table 1are decreased. In more preferred embodiments, a diagnosis of prostatecancer or prostate cancer treatment failure or recurrence is indicatedif two or more circulating cell-free DNA molecules fall within 2, 3, 4,5, 6, or 7 chromosomal regions designated as “DOWN” in Table 4 aredecreased.

In the above various embodiments, preferably the circulating cell-freeDNA molecules have at least 25 consecutive nucleotides in length(preferably at least 50, 70, 80, 100, 120 or 200 consecutive nucleotidesin length). More preferably, the circulating cell-free DNA moleculeshave between about 50 and about 300 or 400, preferably from about 75 andabout 300 or 400, more preferably from about 100 to about 200consecutive nucleotides of a unique sequence within a chromosomal regionas set forth in Table 1 or Table 4.

Detection of Circulating Nucleic Acids in the Blood

In order to detect the presence of circulating nucleic acids in theblood of patients that may have, or are suspected of having, prostatecancer, a blood sample is obtained from the patient. Serum or plasmafrom the blood sample is then analyzed for the presence of a circulatingcell-free DNA or biomarker as described herein. Nucleic acids can beisolated from serum or plasma using well known techniques, see, e.g.,the example sections. In the context of the current invention, thenucleic acid sequences that are analyzed are DNA sequences. Thus, inthis section, methods described as evaluating “nucleic acids” refers tothe evaluation of DNA.

Detection techniques for evaluating nucleic acids for the presence of abiomarker involve procedures well known in the field of moleculargenetics. Further, many of the methods involve amplification of nucleicacids. Ample guidance for performing is provided in the art. Exemplaryreferences include manuals such as PCR Technology: Principles andApplications for DNA Amplification (ed. H. A. Erlich, Freeman Press, NY,N.Y., 1992); PCR Protocols: A Guide to Methods and Applications (eds.Innis, et al., Academic Press, San Diego, Calif., 1990); CurrentProtocols in Molecular Biology, Ausubel, 1994-1999, includingsupplemental updates through April 2004; Sambrook & Russell, MolecularCloning, A Laboratory Manual (3rd Ed, 2001).

Although the methods may employ PCR steps, other amplification protocolsmay also be used. Suitable amplification methods include ligase chainreaction (see, e.g., Wu & Wallace, Genomics 4:560-569, 1988); stranddisplacement assay (see, e.g., Walker et al., Proc. Natl. Acad. Sci. USA89:392-396, 1992; U.S. Pat. No. 5,455,166); and severaltranscription-based amplification systems, including the methodsdescribed in U.S. Pat. Nos. 5,437,990; 5,409,818; and 5,399,491; thetranscription amplification system (TAS) (Kwoh et al., Proc. Natl. Acad.Sci. USA 86:1173-1177, 1989); and self-sustained sequence replication(3SR) (Guatelli et al., Proc. Natl. Acad. Sci. USA 87:1874-1878, 1990;WO 92/08800). Alternatively, methods that amplify the probe todetectable levels can be used, such as Qβ-replicase amplification(Kramer & Lizardi, Nature 339:401-402, 1989; Lomeli et al., Clin. Chem.35:1826-1831, 1989). A review of known amplification methods isprovided, for example, by Abramson and Myers in Current Opinion inBiotechnology 4:41-47, 1993.

In some embodiments, the detection of biomarker in the CNA of a patientis performed using oligonucleotide primers and/or probes to detect atarget sequence, wherein the target sequence is present in (e.g.,comprises some unambiguously assigned portion of) any of the chromosomalregions listed in Table 1 or Table 4. Oligonucleotides can be preparedby any suitable method, usually chemical synthesis, and can also bepurchased through commercial sources. Oligonucleotides can includemodified phosphodiester linkages (e.g., phosphorothioate,methylphosphonates, phosphoamidate, or boranophosphate) or linkagesother than a phosphorous acid derivative into an oligonucleotide may beused to prevent cleavage at a selected site. In addition, the use of2′-amino modified sugars tends to favor displacement over digestion ofthe oligonucleotide when hybridized to a nucleic acid that is also thetemplate for synthesis of a new nucleic acid strand.

In one embodiment, the biomarker is identified by hybridization undersequence-specific hybridization conditions with a probe that targets achromosomal region (e.g., targets some unambiguously assigned portionof, any of the chromosomal regions listed in Table 1 or Table 4)described herein. The probe used for this analysis can be a long probeor sets for short oligonucleotide probes, e.g., from about 20 to about150 nucleotides in length may be employed.

Suitable hybridization formats are well known in the art, including butnot limited to, solution phase, solid phase, oligonucleotide arrayformats, mixed phase, or in situ hybridization assays. In solution (orliquid) phase hybridizations, both the target nucleic acid and the probeor primers are free to interact in the reaction mixture. Techniques suchas real-time PCR systems have also been developed that permit analysis,e.g., quantification, of amplified products during a PCR reaction. Inthis type of reaction, hybridization with a specific oligonucleotideprobe occurs during the amplification program to identify the presenceof a target nucleic acid. Hybridization of oligonucleotide probes ensurethe highest specificity due to thermodynamically controlled two statetransition. Examples for this assay formats are fluorescence resonanceenergy transfer hybridization probes, molecular beacons, molecularscorpions, and exonuclease hybridization probes (e.g., reviewed inBustin, J. Mol. Endocrin. 25:169-93, 2000).

Suitable assay formats include array-based formats, described in greaterdetail below in the “Device” section, where probe is typicallyimmobilized. Alternatively, the target may be immobilized.

In a format where the target is immobilized, amplified target DNA isimmobilized on a solid support and the target complex is incubated withthe probe under suitable hybridization conditions, unhybridized probe isremoved by washing under suitably stringent conditions, and the solidsupport is monitored for the presence of bound probe. In formats wherethe probes are immobilized on a solid support, the target DNA istypically labeled, usually during amplification. The immobilized probeis incubated with the amplified target DNA under suitable hybridizationconditions, unhybridized target DNA is removed by washing under suitablystringent conditions, and the solid support/probe is monitored for thepresence of bound target DNA.

In typical embodiments, multiple probes are immobilized on a solidsupport and the target chromosomal regions in the CNA from a patient areanalyzed using the multiple probes simultaneously. Examples of nucleicacid arrays are described by WO 95/11995.

In an alternative probe-less method, amplified nucleic acidcorresponding to a target nucleic acid present in a chromosomal regionis performed using nucleic acid primers to the chromosomal region and isdetected by monitoring the increase in the total amount ofdouble-stranded DNA in the reaction mixture, is described, e.g., in U.S.Pat. No. 5,994,056; and European Patent Publication Nos. 487,218 and512,334. The detection of double-stranded target DNA relies on theincreased fluorescence various DNA-binding dyes, e.g., SYBR Green,exhibit when bound to double-stranded DNA.

As appreciated by one in the art, specific amplification methods can beperformed in reaction that employ multiple primers to target thechromosomal regions such that the biomarker can be adequately covered.

DNA Sequencing

In preferred embodiments, the presence of a sequence from a chromosomalregion set forth in Table 1 or Table 4 in the CNA from a patientundergoing evaluation is detected by direct sequencing. Such sequencing,especially using the Roche 454, Illumina, and Applied Biosystemssequencing systems mentioned below or similar advanced sequencingsystems, can include quantitation (i.e., determining the level) ofnucleic acids having a particular sequence. Such quantitation can beused in the embodiments of the invention that involve determining thelevel of a biomarker (some embodiments of which involve correlating aparticular level to the presence or absence of cancer). Methods includee.g., dideoxy sequencing-based methods although other methods such asMaxam and Gilbert sequencing are also known (see, e.g., Sambrook andRussell, supra). In typical embodiments, CNA from a patient is sequencedusing a large-scale sequencing method that provides the ability toobtain sequence information from many reads. Such sequencing platformsincludes those commercialized by Roche 454 Life Sciences (GS systems),Illumina (e.g., HiSeq, MiSeq) and Applied Biosystems (e.g., SOLiDsystems).

The Roche 454 Life Sciences sequencing platform involves using emulsionPCR and immobilizing DNA fragments onto bead. Incorporation ofnucleotides during synthesis is detected by measuring light that isgenerated when a nucleotide is incorporated.

The Illumina technology involves the attachment of randomly fragmentedgenomic DNA to a planar, optically transparent surface. Attached DNAfragments are extended and bridge amplified to create an ultra-highdensity sequencing flow cell with clusters containing copies of the sametemplate. These templates are sequenced using a sequencing-by-synthesistechnology that employs reversible terminators with removablefluorescent dyes.

Methods that employ sequencing by hybridization may also be used. Suchmethods, e.g., used in the ABI SOLiD4+ technology uses a pool of allpossible oligonucleotides of a fixed length, labeled according to thesequenced position. Oligonucleotides are annealed and ligated; thepreferential ligation by DNA ligase for matching sequences results in asignal informative of the nucleotide at that position.

The sequence can be determined using any other DNA sequencing methodincluding, e.g., methods that use semiconductor technology to detectnucleotides that are incorporated into an extended primer by measuringchanges in current that occur when a nucleotide is incorporated (see,e.g., U.S. Patent Application Publication Nos. 20090127589 and20100035252). Other techniques include direct label-free exonucleasesequencing in which nucleotides cleaved from the nucleic acid aredetected by passing through a nanopore (Oxford Nanopore) (Clark et al.,Nature Nanotechnology 4: 265-270, 2009); and Single Molecule Real Time(SMRT™) DNA sequencing technology (Pacific Biosciences), which is asequencing-by synthesis technique.

Devices and Kits

In a further aspect, the invention provides diagnostic devices and kitsuseful for identifying one or more prostate cancer-associated biomarkersin the CNA from a patient where the one or more biomarkers is a sequencecorresponding to any of the chromosomal regions set forth in Table 1and/or Table 4. As will be apparent to skilled artisans, the kit of thepresent invention is useful in the above-discussed method for analyzingcirculating cell-free DNA in a patient sample and in diagnosing,screening or monitoring prostate cancer as described above.

Thus, in one aspect, the present invention provides the use of at leastone oligonucleotide for the manufacture of a diagnostic kit useful indiagnosing, screening or monitoring prostate cancer. The nucleotidesequence of the oligonucleotide falls within a chromosomal region setforth in Table 1 or Table 4.

Preferably, the kit of the present invention includes one, two or more(e.g., at least 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95 or 100, preferably from one to 100 or from 1 to27, sets of oligonucleotides. Each set comprises one or moreoligonucleotides (e.g., from about one to about 10,000, preferably from50, 100, 200 or 300 to about 10,000). All of the nucleotide sequences ofsuch one or more oligonucleotides in each set fall within the same onesingle chromosomal region that is set forth in Table 1. In someembodiments, all of the nucleotide sequences of such one or moreoligonucleotides in each set fall within the same one single chromosomalregion that is set forth in Table 4. Each oligonucleotide should havefrom about 18 to 100 nucleotides, or from 20 to about 50 nucleotides,and is capable of hybridizing, under stringent hybridization conditions,to the chromosomal region in which its sequence falls. Theoligonucleotides are useful as probes for detecting circulatingcell-free DNA molecules derived from the chromosomal regions.Preferably, each set includes a sufficient number of oligonucleotideswith sequences mapped to one chromosomal region such that anycirculating cell-free DNA molecules derived from the chromosomal regioncan be detected with the oligonucleotide set. Thus, the number ofoligonucleotides required in each set is determined by the total lengthof unique nucleotide sequence of a particular chromosomal region, aswill be apparent to skilled artisans. Such total lengths are indicatedin Table 1 and Table 4.

Preferably, in the kit of the present invention, differentoligonucleotide sets correspond to different chromosomal regions withinthe same table. Preferably, the oligonucleotides are free of repetitiveelement. Optionally, the oligonucleotides are attached to one or moresolid substrates such as microchips and beads. In preferred embodiments,the kit is a microarray with the above oligonucleotides.

In one embodiment, the kit of the present invention includes a pluralityof oligonucleotide sets capable of hybridizing to the chromosomalregions set forth in the tables. That is, the kit includesoligonucleotide probes corresponding to each and every chromosomalregions set forth in Table 1 or Table 4, such that all circulatingcell-free DNA derived from any chromosomal region set forth in Table 1or Table 4 can be detected using the kit.

Use of the oligonucleotides included in the kit described for themanufacture of the kit useful for diagnosing, screening or monitoringprostate cancer is also contemplated. The manufacturing of such kitshould be apparent to a skilled artisan.

In some embodiments, a diagnostic device comprises probes to detect atleast 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 75, 80, 85,90, 95, or 100 chromosomal regions set forth in Table 1. In otherembodiments, a diagnostic device comprises probes to detect at least 2,3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 21, 22, 23, 24, 25, 26 or 27chromosomal regions set forth in Table 4. In some embodiments, thepresent invention provides probes attached to a solid support, such asan array slide or chip, e.g., as described in DNA Microarrays: AMolecular Cloning Manual, 2003, Eds. Bowtell and Sambrook, Cold SpringHarbor Laboratory Press. Construction of such devices are well known inthe art, for example as described in US Patents and Patent PublicationsU.S. Pat. No. 5,837,832; PCT application WO95/11995; U.S. Pat. Nos.5,807,522; 7,157,229, 7,083,975, 6,444,175, 6,375,903, 6,315,958,6,295,153, and 5,143,854, 2007/0037274, 2007/0140906, 2004/0126757,2004/0110212, 2004/0110211, 2003/0143550, 2003/0003032, and2002/0041420. Nucleic acid arrays are also reviewed in the followingreferences: Biotechnol Annu Rev 8:85-101 (2002); Sosnowski et al,Psychiatr Genet 12(4):181-92 (December 2002); Heller, Annu Rev BiomedEng 4: 129-53 (2002); Kolchinsky et al, Hum. Mutat 19(4):343-60 (April2002); and McGail et al, Adv Biochem Eng Biotechnol 77:21-42 (2002).

Any number of probes may be implemented in an array. A probe set thathybridizes to different, preferably unique, segments of a chromosomalregion may be used where the probe set detects any part of thechromosomal region. Alternatively, a single probe to a chromosomalregion may be immobilized to a solid surface. Polynucleotide probe canbe synthesized at designated areas (or synthesized separately and thenaffixed to designated areas) on a substrate, e.g., using alight-directed chemical process. Typical synthetic polynucleotides canbe about 15-200 nucleotides in length.

The kit can include multiple biomarker detection reagents, or one ormore biomarker detection reagents in combination with one or more othertypes of elements or components (e.g., other types of biochemicalreagents, containers, packages such as packaging intended for commercialsale, substrates to which biomarker detection reagents are attached,electronic hardware components, etc.). Accordingly, the presentinvention further provides biomarker detection kits and systems,including but not limited to arrays/microarrays of nucleic acidmolecules, and beads that contain one or more probes or other detectionreagents for detecting one or more biomarkers of the present invention.The kits can optionally include various electronic hardware components;for example, arrays (“DNA chips”) and microfluidic systems(“lab-on-a-chip” systems) provided by various manufacturers typicallycomprise hardware components. Other kits may not include electronichardware components, but may be comprised of, for example, one or morebiomarker detection reagents (along with, optionally, other biochemicalreagents) packaged in one or more containers.

Biomarker detection kits/systems may contain, for example, one or moreprobes, or sets of probes, that hybridize to a nucleic acid moleculepresent in a chromosomal region set forth in Table 1 or Table 4.

A biomarker detection kit of the present invention may includecomponents that are used to prepare CNA from a blood sample from apatient for the subsequent amplification and/or detection of abiomarker.

Correlating the Presence of Biomarkers with Prostate Cancer

The present invention provides methods and reagents for detecting thepresence of a biomarker in CNA from a patient that has prostate canceror that is being evaluated to determine if the patient may have prostatecancer. In the context of the invention, “detection” or “identification”or “identifying the presence” or “detecting the presence” of a biomarkerassociated with prostate cancer in a CNA sample from a patient refers todetermining any level of the biomarker in the CNA of the patient wherethe level is greater than a threshold value that distinguishes betweenprostate cancer and non-prostate cancer CNA samples for a given assay.

In the current invention, for example, the presence of, or increase inthe level of, relative to normal, any one of the chromosomal regions(i.e., biomarkers) listed as “UP” in Table 1 or Table 4 is indicative ofprostate cancer. As appreciated by one of skill in the art, biomarkersmay be employed in analyzing a patient sample where the biomarker hasalso been observed infrequently in a normal patient in order to increasethe sensitivity of the detection. Given the low frequency of occurrencein normal samples relative to the higher frequency of occurrence inprostate cancer, the presence of, or increase in level of, the biomarkerin a patient indicates that the patient has a 95% or greater likelihoodof having prostate cancer. Thus, for example, arrays can be used todetect the chromosomal regions can include those that identify thechromosomal regions in Table 1 or Table 4.

The biomarkers designated as “UP” in Table 1 or Table 4 are associatedwith prostate cancer, i.e., they are over-represented in prostate cancerpatients compared to individuals not diagnosed with prostate cancer.Thus, the detection of an increase, relative to non-prostate cancerpatients, in the level of one or more of the biomarkers designated as“UP” in Table 1 or Table 4 is indicative of prostate cancer, i.e., thepatient has an increased probability of having prostate cancer comparedto a patient that does not have an increase in the level of thebiomarker. In some embodiments, the detection and increase in the levelof two or more biomarkers designated as “UP” in Table 1 in the CNA of apatient is indicative of a greater probability for prostate cancer. Inother embodiments, the detection and increase in the level of two ormore biomarkers designated as “UP” in Table 4 in the CNA of a patient isindicative of a greater probability for prostate cancer. As understoodin the art, other criteria, e.g., clinical criteria, etc., are alsoemployed to diagnose prostate cancer in the patient. Accordingly,patients that have a biomarker associated with prostate cancer alsoundergo other diagnostic procedures.

In some embodiments, one or more biomarkers that are under-representedin prostate cancer may be detected in the CNA of a patient. Thus, forexample, a biomarker listed in Table 1 or Table 4 may be detected in aCNA sample from a patient where the detection of the biomarker isindicative of a normal diagnosis, i.e., that the patient does not haveprostate cancer.

“Over-represented” or “increased amount” means that the level of one ormore circulating cell-free DNAs is higher than normal levels. Generallythis means an increase in the level as compared to an index value.Conversely, “under-represented” or “decreased amount” means that thelevel of one or more particular circulating cell-free DNA molecules islower than normal levels. Generally this means a decrease in the levelas compared to an index value.

In preferred embodiments, the test value representing the level of aparticular circulating cell-free DNA is compared to one or morereference values (or index values), and optionally correlated toprostate cancer or cancer recurrence. Optionally, an increasedlikelihood of prostate cancer is indicated if the test value is greaterthan the reference value for CNA listed as “UP” in Table 1 or Table 4,or less than the reference value for CNA listed as “DOWN” in Table 1 orTable 4.

In some embodiments, once a patient has been determined to have at leastone biomarker listed in Table 1 or Table 4, a therapy to treat cancer,e.g., prostate cancer, is effected.

Those skilled in the art are familiar with various ways of deriving andusing index values. For example, the index value may represent the copynumber or concentration of a particular cell-free DNA listed as “UP” inTable 1 or Table 4 in a blood sample from a patient of interest in ahealthy state, in which case a copy number or concentration in a samplefrom the patient at a different time or state significantly higher(e.g., 1.01-fold, 1.05-fold, 1.10-fold, 1.2-fold, 1.3-fold, 1.4-fold,1.5-fold, 1.6-fold, 1.7-fold, 1.8-fold, 1.9-fold, 2-fold, 3-fold,4-fold, 5-fold, 10-fold, 20-fold, 30-fold, 40-fold, 50-fold, 100-fold ormore higher) than this index value would indicate, e.g., prostate canceror increased likelihood of cancer recurrence. In some embodiments, thelevel of the CNA is “increased” if it is at least 1, 2, 3, 4, 5, 10, 15,20 or more standard deviations greater than the index value in normalsubjects. In some embodiments, an index value may represent the copynumber or concentration of a particular cell-free DNA listed as “DOWN”in Table 1 or Table 4 in a blood sample from a patient of interest in ahealthy state, in which case a copy number or concentration in a samplefrom the patient at a different time or state significantly lower (e.g.,1.01-fold, 1.05-fold, 1.10-fold, 1.2-fold, 1.3-fold, 1.4-fold, 1.5-fold,1.6-fold, 1.7-fold, 1.8-fold, 1.9-fold, 2-fold, 3-fold, 4-fold, 5-fold,10-fold, 20-fold, 30-fold, 40-fold, 50-fold, 100-fold or more lower)than this index value would indicate, e.g., prostate cancer or increasedlikelihood of prostate cancer recurrence. In some embodiments the levelof the CNA is “decreased” if it is at least 1, 2, 3, 4, 5, 10, 15, 20 ormore standard deviations lower than the index value in normal subjects

Alternatively, the index value may represent the average concentrationor copy number of a particular circulating cell-free DNA for a set ofindividuals from a diverse cancer population or a subset of thepopulation. For example, one may determine the average copy number orconcentration of a circulating cell-free DNA in a random sampling ofpatients with prostate cancer. Thus, patients having a copy number orconcentration (test value) comparable to or higher than, this valueidentified as having an increased likelihood of having prostate canceror prostate cancer recurrence than those having a test value lower thanthis value.

A useful index value may represent the copy number or concentration of aparticular circulating cell-free DNA or of a combination (weighted orstraight addition) of two or more circulating cell-free DNAscorresponding to the same chromosomal region or different chromosomalregions. When two or more biomarkers or circulating cell-free DNAmolecules are used in the diagnosis/monitoring method, the level of eachbiomarker or circulating cell-free DNA can be weighted and combined.Thus, a test value may be provided by (a) weighting the determined levelof each circulating cell-free DNA molecule with a predefinedcoefficient, and (b) combining the weighted level to provide a testvalue. The combining step can be either by straight addition oraveraging (i.e., weighted equally) or by a different predefinedcoefficient.

The information obtained from the biomarker analysis may be stored in acomputer readable form. Such a computer system typically comprises majorsubsystems such as a central processor, a system memory (typically RAM),an input/output (I/O) controller, an external device such as a displayscreen via a display adapter, serial ports, a keyboard, a fixed diskdrive via a storage interface and a floppy disk drive operative toreceive a floppy disc, and a CD-ROM (or DVD-ROM) device operative toreceive a CD-ROM. Many other devices can be connected, such as a networkinterface connected via a serial port.

The computer system may also be linked to a network, comprising aplurality of computing devices linked via a data link, such as anEthernet cable (coax or 10BaseT), telephone line, ISDN line, wirelessnetwork, optical fiber, or other suitable signal transmission medium,whereby at least one network device (e.g., computer, disk array, etc.)comprises a pattern of magnetic domains (e.g., magnetic disk) and/orcharge domains (e.g., an array of DRAM cells) composing a bit patternencoding data acquired from an assay of the invention.

The computer system can comprise code for interpreting the results of astudy evaluating the presence of one or more of the biomarkers. Thus inan exemplary embodiment, the biomarker analysis results are provided toa computer where a central processor executes a computer program fordetermining the likelihood of a patient that has prostate cancer.

The invention also provides the use of a computer system, such as thatdescribed above, which comprises: (1) a computer; (2) a stored bitpattern encoding the biomarker testing results obtained by the methodsof the invention, which may be stored in the computer; (3) and,optionally, (4) a program for determining the likelihood of a patienthaving prostate cancer.

The invention further provides methods of generating a report based onthe detection of one or more biomarkers set forth in Table 1 or Table 4.

Thus, the present invention provides systems related to the abovemethods of the invention. In one embodiment the invention provides asystem for analyzing circulating cell-free DNA, comprising: (1) a sampleanalyzer for executing the method of analyzing circulating cell-free DNAin a patient's blood, serum or plasma as described in the variousembodiments above (incorporated herein by reference); (2) a computersystem for automatically receiving and analyzing data obtained in step(1) to provide a test value representing the status (presence or absenceor amount, i.e., concentration or copy number) of one or morecirculating cell-free DNA molecules having a nucleotide sequence of atleast 25 nucleotides falling within a chromosomal region set forth inTable 1 or Table 4, and optionally for comparing the test value to oneor more reference values each associated with a predetermined status ofprostate cancer. In some embodiments, the system further comprises adisplay module displaying the comparison between the test value and theone or more reference values, or displaying a result of the comparingstep.

Thus, as will be apparent to skilled artisans, the sample analyzer maybe, e.g., a sequencing machine (e.g., Illumina HiSeq™, Ion Torrent PGM,ABI SOLiD™ sequencer, PacBio RS, Helicos Heliscope™, etc.), a PCRmachine (e.g., ABI 7900, Fluidigm BioMark™, etc.), a microarrayinstrument, etc.

In one embodiment, the sample analyzer is a sequencing instrument, e.g.,a next-generation sequencing instrument such as Roche's GS systems,Illumina's HiSeq and MiSeq, and Applied Biosystems' SOLiD systems.Circulating cell-free DNA molecules are isolated from a patient's bloodor serum or plasma, and the sequences of all of the circulatingcell-free DNA molecules are obtained using the sample analyzer. Thesequencing instrument is used in sequencing the circulating cell-freeDNA molecules, and obtaining the sequences of these molecules. Acomputer system is then employed for automatically analyzing thesequences to determine the level of a circulating cell-free DNA moleculehaving a nucleotide sequence of at least 25 nucleotides falling within achromosomal region set forth in Table 1 or Table 4 in the sample. Forexample, the computer system may compare the sequence of eachcirculating cell-free DNA molecule in the sample to the sequence,available in the human sequence database, of the chromosomal region todetermine if there is a match, i.e., if the sequence of a circulatingcell-free DNA molecule falls within a chromosomal region set forth inTable 1 or Table 4. The copy number of a particular circulatingcell-free DNA molecule is also automatically determined by the computersystem. Optionally the computer system automatically correlates thesequence analysis result with a diagnosis regarding prostate cancer. Forexample, if one, and preferably two or more, circulating cell-free DNAmolecules are identified to be derived from chromosomal regionsdesignated as “UP” in Table 1 or Table 4 and present at an increasedlevel, then the computer system automatically correlates this analysisresult with a diagnosis of prostate cancer. If one, and preferably twoor more, circulating cell-free DNA molecules are identified to bederived from chromosomal regions designated as “DOWN” in Table 1 orTable 4 and present at a decreased level, then the computer systemautomatically correlates this analysis result with a diagnosis ofprostate cancer. Optionally, the computer system further comprises adisplay module displaying the results of sequence analysis and/or theresult of the correlating step. The display module may be for example, adisplay screen, such as a computer monitor, TV monitor, or the touchscreen, a printer, and audio speakers.

The computer-based analysis function can be implemented in any suitablelanguage and/or browsers. For example, it may be implemented with Clanguage and preferably using object-oriented high-level programminglanguages such as Visual Basic, SmallTalk, C++, and the like. Theapplication can be written to suit environments such as the MicrosoftWindows™ environment including Windows™ 98, Windows™ 2000, Windows™ NT,and the like. In addition, the application can also be written for theMacIntosh™, SUN™, UNIX or LINUX environment. In addition, the functionalsteps can also be implemented using a universal or platform-independentprogramming language. Examples of such multi-platform programminglanguages include, but are not limited to, hypertext markup language(HTML), JAVA™, JavaScript™, Flash programming language, common gatewayinterface/structured query language (CGI/SQL), practical extractionreport language (PERL), AppleScript™ and other system script languages,programming language/structured query language (PL/SQL), and the like.Java™- or JavaScript™-enabled browsers such as HotJava™, Microsoft™Explorer™, or Netscape™ can be used. When active content web pages areused, they may include Java™ applets or ActiveX™ controls or otheractive content technologies.

The analysis function can also be embodied in computer program productsand used in the systems described above or other computer- orinternet-based systems. Accordingly, another aspect of the presentinvention relates to a computer program product comprising acomputer-usable medium having computer-readable program codes orinstructions embodied thereon for enabling a processor to carry out theanalysis and correlating functions as described above. These computerprogram instructions may be loaded onto a computer or other programmableapparatus to produce a machine, such that the instructions which executeon the computer or other programmable apparatus create means forimplementing the functions or steps described above. These computerprogram instructions may also be stored in a computer-readable memory ormedium that can direct a computer or other programmable apparatus tofunction in a particular manner, such that the instructions stored inthe computer-readable memory or medium produce an article of manufactureincluding instruction means which implement the analysis. The computerprogram instructions may also be loaded onto a computer or otherprogrammable apparatus to cause a series of operational steps to beperformed on the computer or other programmable apparatus to produce acomputer implemented process such that the instructions which execute onthe computer or other programmable apparatus provide steps forimplementing the functions or steps described above.

In some embodiments, once it has been determined

Assessment of Total CNI in Cell-Free Circulating DNA

In another aspect, the invention provides a method of evaluating thetotal severity of chromosomal rearrangement in a patient with cancer,such as prostate cancer, irrespective of the site of the chromosomalregion. Accordingly, the degree or amount of chromosomal rearrangementsin the cancer cells from the patient can be transformed into a biomarkerscore.

Cell-free circulating DNA can be sequenced using methods describedherein. The number of sequences that map to unique regions of the genomecan be determined. Methods of quantifying the levels of in a patientcompared to normal controls are known in the art. In this embodiment,using circulating DNA as measure, such a score can be calculated indifferent ways e.g., by using restricted counts or sums, by using otherreference material (e.g. genomic DNA) or other distribution models thanthe Gaussian or using different cut-offs for positivity or combinationsof such. Such scoring will be typically dependent on the technology usedas well as on the number or sequence reads that are generated for anysample.

For example, in some embodiments, the CNIscore from a patient may becompared to an index value CNIscore for normal individuals. Thus, forexample, a CNIscore indicative of a cancer, e.g., prostate cancer, maybe at least 1, 2, 3, 4, 5, 10, 15, 20 or more standard deviations fromthe index value in normal subjects. In some embodiments, a patient thatis determined to have a CNIscore indicative of cancer, e.g., prostatecancer receives a therapy to treat the cancer, e.g., radiation,chemotherapy, hormone therapy, etc.

The following examples are provided by way of illustration only and notby way of limitation. Those of skill in the art will readily recognize avariety of non-critical parameters that could be changed or modified toyield essentially similar results.

EXAMPLES Example 1. Identification of Prostate Cancer (PrCa) AssociateCNA Study Samples

The study evaluated 204 serum samples obtained from patients withprostate cancer (e.g., having a histopathology report of invasiveprostate carcinoma), 20 samples from other medical conditions and 207serum samples from healthy (e.g., defined as asymptomatic or have anegative biopsy) controls. Sample sets from multiple centers were usedin the trial, where for each given set of cases, their correspondingmatching controls originated from the same center. Patient serum sampleswere obtained from different sites: Ryazan Central Oblast Hospital,Russia (n=100), Dr. Narod in Toronto (n=200), and commercial vendors(e.g., Proteogenex), to achieve a total number of at least 200 cases and200 matched controls. Of the 200 cases, 89 patients had a Gleason score<7 and 76 cases were from patients diagnosed at ≤65 years of age. Bloodwas drawn preoperatively from treatment-naïve patients under local IRBapproval and processed as described previously (Beck et al., Clin. Chem.55:730-738, 2009).

Patient samples were run in batches that are built to include cases andcontrols in each batch to ensure avoidance of batch effects. Afterinitial analyses samples were analyzed in silico using randomly assignedtraining validation sets in an appropriate number of rounds for clusteranalysis.

Construction of Sequencing Libraries

After extraction of DNA from serum or plasma, using a standardsilica-based method, a whole genome amplification was performed induplicate. The products of the two reactions were pooled and used forfurther analysis. In particular, DNA was extracted from ≥200 μL fo eachsample and used for two independent amplifications using the Genomeplexkit for single cell (Sigma). The P2 adapter used for sequencing and a 10bp sample-specific nucleotide sequence (also referred to as molecularbarcode) are added by PCR using fusion-primers. Two consecutive PCRswith different fusion-primers were performed; the total number of cycleswas four. Following the PCRs, the tagged DNA of upto 50 samples waspooled and all further preparations were performed on this pooled DNAmaterial. Further library preparation steps were as follows:

i) Restriction of DNA with endonuclease NlaIII;

ii) Removal of the 3′ overhangs created by NlaIII using the Large KlenowFragment;

iii) Ligation of P1 (second sequencing adapter that in some instancescontains a 10-bp molecular barcode sequence) to the blunted ends;

iv) Amplification of the library by a maximum of 10 cycles of PCR usingprimers complementary to the P1/P2 adapters of the fragments; and

v) Size-selection using the iBase electrophoresis system and 2% E-Gelsize selection agarose gels (Invitrogen) to obtain fragments in therange of 150-250 bp.

Sequencing

Sequencing of the libraries was performed on a SOLiD4+ Instrument(Applied Biosystems) equipped with an EZBead-System (Applied Biosystems)for conducting the emulsion PCRs. All necessary reagents were purchasedfrom Applied Biosystems. Emulsion PCRs and sequencing was performed asrecommended by the manufacturer. For some libraries the first ten basesof each read constitute the molecular barcode, therefore, the net readlength used for mapping was 40 bp. For other libraries, the barcodesequences is located between an internal adapter and the P2 adapter. Thebarcode sequences were obtained in separate sequencing cycles. Thereforethe full length of the P1 read (50 bp) was used for mapping against thehuman genome.

Data Analysis

The sequence reads were assigned to the different samples according tothe sequence of the molecular barcode. A total of ten slides were usedfor the entire study.

The sequences were mapped to the human genome (Build 36.1/Hg18) andresults are stored in binary alignment map files (BAM). Alignment of rawSOLiD reads were performed using the software BioScope™ ver. 1.2(Applied Biosystems). These were used as input data to calculate “hitcounts in” bins of 100 kbp with a 50 kbp sliding window using thesoftware suite BedTools ver. 2.14.2 (University of Virgina,Charlottesville, Va.). Table 2 shows an example of the analysis outputof one sample and chromosome. From these files the chromosome, binposition and read count were used as input for subsequent analyses.

Once the reads counts per bin were determined for each sample (secondarydata), the secondary data were used for an in silico training-validationstudy. From each, the group of cases and the control group, 50% wererandomly assigned to the training set and evaluated (e.g. in anunsupervised cluster search). The resulting clusters were then appliedto the remaining 50% of samples (validation set). This procedure wasrepeated 1227 times per sample set or sample subset.

Regions of genomic deviation in cancer were selected from the randomizedtraining/validation, by means of their segregating power and used in afinal model to be applied on the whole set, or subsets to be evaluatedindividually (e.g. regional subsets). Standard ROC analyses along withsome categorical analyses were used to evaluate signature performance inthe trial overall and among sub-groups of interest.

All data were first normalized to their total counts, matching the HG18in a uniquely manner. To account for slide-to-slide variations, thecounts per bin were normalized to the ratio per bin and slide using onlysamples assigned to the control group using the following equations:

$\begin{matrix}{{{run}\text{/}{{slide}(i)}\text{:}x_{n,{bin}}} = \frac{{count}_{n,{bin}}}{\sum\limits_{{bin} = 1}^{{bin} = 56684}{count}_{n,{bin}}}} & {{Equation}\mspace{14mu} 1}\end{matrix}$

where:count_(n,bin) is the number of reads per bin of an individual (n) asgiven in the BED-files. The formula above shows the Globalnormalization; for Local normalization the divisor is per interrogatedchromosome.

Followed by:

$\begin{matrix}{{\overset{\bullet}{Y}}_{n,i,{bin}} = \frac{{X_{n,i,{bin}} \times \overset{\_}{X}i},\; {bin}}{{\overset{\_}{X}{all}},{bin}}} & {{Equation}\mspace{14mu} 2}\end{matrix}$

where:x_(n,i,bin) is for each bin the normalized read count of the individual(n) on slide (i)x _(i,bin) is the average per bin over normal individuals on a slide (i)x _(all,bin) is the average per bin over normal individuals on allslidesX _(i,bin) and X _(all,bin) are stored for subsequent calculations.

The {dot over (Y)} values are calculated on the fly for the finaldefinition of diagnostic genomic clusters using an unsupervised clustersearch as follows:

The first step of the unsupervised cluster search (UCS) was:

-   -   1) Normalization of the reads (per sample)        -   a. Global->total reads as basis        -   b. Local->read per chromosome as basis            For 1228 rounds, the data were randomized into training            (50%) and validation set (50%). The training sets were used            to:    -   1) Optimize clusters that segregated disease from control group        by        -   a. Combining consecutive clusters (add {dot over (Y)} of            next bin)        -   b. Stopping at maximum of either:            -   i. #disease <k—smallest control            -   ii. #disease >k—largest control    -   2) Record when optimum were found and # disease >19, otherwise        go to 3):        -   a. Normalization (Global/Local)        -   b. Chromosome        -   c. Optimized region (start—stop)        -   d. #disease samples positive in training set        -   e. #disease samples positive in validation set using:            -   i. delimiter from training set            -   ii. delimiter from validation set (according to 1b)        -   f. C-Statistics        -   g. values for each sample in (segregated disease/control)            -   i. training set            -   ii. validation set    -   3) Perform analysis on next window

The next randomization was performed and the data recorded into a newtable. All regions identified from the UCS above were combined andranked according to their number of occurrences in the 1228 rounds.Figure illustrates a flowchart of the UCS. In this study k was set to 4.

The result for each sample was then retrieved for the 100 highestranking regions (Table 3) and further processed for controls andprostate cancer.

A Stepwise procedure comprising stepwise out and stepwise in was used toselect the final regions. In Stepwise out, the data were then cleanedfor cross-correlated regions (all regions that did not have more than 14samples with deviating results were censored). Subsequently, regionsthat do not have additional information content over other regions wereeliminated in a step-wise out approach, where the first 10 regions(highest ranks) were excluded herein. In Stepwise in, a classicalstepwise in procedure was used up to the point where the informationcontent of the combined data does reach its limit herein. The results ofboth procedural directions are given in Table 3. For the final regionselection, regions that hold in both stepwise procedures wereconsidered. This resulted in 27 regions to be used as final candidates,which was followed by introducing a weighting factor for finaloptimization on those 27 regions, which hold in the preceding step.Table 4 shows the final selected regions. Table 5 shows thecross-correlation between selected regions.

For each region the delimiter was set to the value corresponding tok-smallest {dot over (Y)}_(region) value of controls for regions denoted“Down” and to the k-largest {dot over (Y)}_(region) value of controlsfor regions denoted “Up” respectively. Any {dot over (Y)}_(region) valuegreater than a delimiter for region denoted “UP” or lower than adelimiter for regions denoted “Down” was assigned a Score value of 1.Else the score was set to 0. Using k=6, for each control and patientsample the CHX-Index was then calculated as:

$\begin{matrix}{{CHXindex} = {\sum\limits_{{region} = 1}^{27}\; {{Score} \times {Weight}}}} & {{Equation}\mspace{14mu} 3}\end{matrix}$

The ROC data are calculated from the CHX-Index. ROC curves with 95%confidence intervals were then calculated from the data using thestatistical software “Analyse-it for Excel vers. 2.26 (Analyse-itSoftware, Ltd.).

Results

The AURoC was 92.7% (CI: 0.902-0.951) when comparing 204 PrCa samples vs207 Controls (see, Table 6). The AURoC was above 85% for the followingqueries tested: PrCa vs Controls, PrCA vs Controls which included tenbenign prostate hyperplasia and ten prostatitis samples, Gleason scoresbelow to, or above and equal 7 and age≤65.

Two different library construction and sequencing approaches were testedin the study. For a set of slides, the barcode sequence was positionedwithin the first 10 bp of the 50 bp sequencing read, leaving 40 bpusable for mapping. And for another set of slides, separate barcodesequencing was performed leaving 50 bp usable for mapping. AURoC valueswere obtained for each subgroup. The AURoC was 0.91 for 40 bp reads and0.95 for 50 bp. Although not statistically significantly (p=0.06)different, the trend leads to the conclusion, that for future studies 50bp or higher would be preferable.

The other medical conditions (OMC) samples consisted of ten benignprostate hyperplasia and ten prostatitis samples. These were notincluded in the UCS, but added as additional controls for confirmation.The ROC AUC, when adding these samples were not deteriorating comparedto the original set, which serves as additional proof for the usage ofthe selected regions.

The results of the CHX analysis were compared to those samples know tohave a PSA result. Figure shows the scatter plot from a non-parametricSpearman Rank Correlation analysis. The correlation of the CHX and PSAtests had a R(S)-value of 0.501 (0.398 to 0.591), t-value:8.82corresponding to a p-value of 2.8×10⁻¹⁶. The PSA levels of 22 controlindividuals were >5.0 μg/l; these individuals have been followed for atleast 2 years, without any sign of prostate cancer and were thereforeassigned to the control group (N.B. the PSA determination was based onthe older standard; values for the reference WHO 96/670 are about 80%,the value of 5.0 above corresponds to 4.0 according to the referencestandard).

Example 2. Evaluation of Copy Number Index for Circulating Cell-Free DNAin Prostate Cancer Patients

Copy number instabilities/variations are a known characteristic ofmalignancies. Therefore, cell-free DNA samples from individual patientswere analyzed to determine whether tumor-derived copy numberinstabilities are quantitatively reflected in the circulating DNA ofindividual patients. In this example, it is not chromosomal regions thatare predominantly seen in this tumor patient group that were used, butthe total severity of chromosomal rearrangements in an individualpatient with cancer, irrespective of the chromosomal location wastransformed into a biomarker (CNIscore).

For this exampless, LOESS normalized sequence read counts (cf Equation2)from a subset of samples that consisted of fresh drawn samples withtotal mappable reads of >1.3 million reads were used for defining a CNIseverity score. The samples were stratified into those from normal maleindividuals (N=95) and those from prostate cancer patients (N=82). Foreach 100 kilo basepair bin, the normalized read counts were transferredinto Z-values

$\left( {Z = \frac{X_{i} - {\overset{\_}{X}}_{norm}}{{SD}_{norm}}} \right),$

followed by a Parzen-Rosenblatt smoothing (Parzen E (1962) Annals ofMathematical Statistics 33: 1065-1076; Rosenblatt M (1956) Annals ofMathematical Statistics 27: 832-837). For each sample, it was calculatedhow many bins were found to exhibit a Z-value >1, >2, >3, >4, >5, and soforth. The number of such genomic 100 kbp windows (with copy numbersdeviating from the normal group at a given Z-value level), were thencounted as a summative score (CNIscore). In addition, the absoluteZ-Scores above (and below) a certain border can be summed up to generatea CNIscore. When using the border of Z >2, the resulting ROC curve fromthe sum CNIscore is shown in FIG. 3. The AUC to separate the prostatecancer from normals was 0.81 for the global normalization and 0.80 forlocal normalization (not shown). FIG. 4 provides an exemplification ofsuch copy number deviations. FIG. 4a provides a CIROCS plot (Krzywinski,et al. Genome Res (2009) 19:1639-1645) of five normal individuals,showing the CNA Z-values. In comparing the CIRCOS plot from the normalindividual to a CIRCOS plot (FIG. 4b ) of five representative prostatecancer patients, it can clearly be seen that prostate cancer samplesexhibited a high accumulation of CNIs in the circulating DNA scatteringthroughout the genome (only data from global normalization are shown).

All patents, patent applications, and other published referencematerials cited in this specification are hereby incorporated herein byreference in their entirety for their disclosures of the subject matterin whose connection they are cited herein.

TABLE 1 Chromosomal Regions Selected from 1228 runs of randomly selected50% training sets # Chromosome Region Start Stop Up/Down Normalization 1Hs8 43050001-43250000 43050001 43250000 UP Global 2 Hs1561450001-61750000 61450001 61750000 UP Local 3 Hs1 212200001-212400000212200001 212400000 DOWN Global 4 Hs8 43150001-43350000 4315000143350000 UP Local 5 Hs13 21200001-21800000 21200001 21800000 DOWN Local6 Hs17 58300001-58500000 58300001 58500000 UP Local 7 Hs1561500001-61900000 61500001 61900000 UP Global 8 Hs1 148450001-148650000148450001 148650000 UP Local 9 Hs17 58250001-58550000 58250001 58550000UP Local 10 Hs5 11200001-11600000 11200001 11600000 DOWN Local 11 Hs843150001-43250000 43150001 43250000 UP Global 12 Hs10 17900001-1820000017900001 18200000 UP Local 13 Hs15 61500001-61800000 61500001 61800000UP Local 14 Hs9 88650001-88750000 88650001 88750000 UP Local 15 Hs1561350001-61750000 61350001 61750000 UP Local 16 Hs9 88650001-8875000088650001 88750000 UP Global 17 Hs1 88950001-89450000 88950001 89450000UP Local 18 Hs2 132950001-133150000 132950001 133150000 DOWN Local 19Hs4 41850001-42050000 41850001 42050000 DOWN Local 20 Hs1089050001-89250000 89050001 89250000 UP Local 21 Hs8 43100001-4340000043100001 43400000 UP Local 22 Hs2 230150001-230450000 230150001230450000 UP Global 23 Hs4 186750001-187250000 186750001 187250000 DOWNLocal 24 Hs10 27600001-27900000 27600001 27900000 UP Local 25 Hs10109750001-110050000 109750001 110050000 UP Global 26 Hs843100001-43400000 43100001 43400000 UP Global 27 Hs13 21250001-2175000021250001 21750000 DOWN Local 28 Hs3 55450001-55650000 55450001 55650000DOWN Local 29 Hs16 67750001-67950000 67750001 67950000 UP Local 30 Hs355400001-55700000 55400001 55700000 DOWN Local 31 Hs2132950001-133250000 132950001 133250000 DOWN Global 32 Hs2032450001-32550000 32450001 32550000 UP Local 33 Hs13 21300001-2180000021300001 21800000 DOWN Local 34 Hs22 28650001-28850000 28650001 28850000UP Local 35 Hs16 57450001-57750000 57450001 57750000 UP Local 36 Hs1027650001-27850000 27650001 27850000 UP Local 37 Hs2 64850001-6495000064850001 64950000 UP Local 38 Hs22 28600001-28900000 28600001 28900000UP Local 39 Hs10 109750001-110050000 109750001 110050000 UP Local 40 Hs1148550001-148750000 148550001 148750000 UP Local 41 Hs4186600001-187100000 186600001 187100000 DOWN Local 42 Hs2230100001-30300000 30100001 30300000 UP Local 43 Hs1 88800001-8950000088800001 89500000 UP Local 44 Hs20 16000001-16200000 16000001 16200000DOWN Global 45 Hs20 40000001-40200000 40000001 40200000 DOWN Global 46Hs6 58450001-58550000 58450001 58550000 UP Global 47 Hs1110100001-10300000 10100001 10300000 UP Local 48 Hs20 57850001-5825000057850001 58250000 UP Local 49 Hs17 54550001-54750000 54550001 54750000UP Local 50 Hs10 32950001-33150000 32950001 33150000 UP Local 51 Hs1645650001-45950000 45650001 45950000 UP Global 52 Hs20 42300001-4270000042300001 42700000 UP Local 53 Hs10 17900001-18200000 17900001 18200000UP Global 54 Hs13 18350001-18650000 18350001 18650000 DOWN Local 55 Hs247750001-47850000 47750001 47850000 DOWN Global 56 Hs2 64850001-6495000064850001 64950000 UP Global 57 Hs1 197450001-197750000 197450001197750000 UP Global 58 Hs2 133000001-133400000 133000001 133400000 DOWNGlobal 59 Hs8 42950001-43350000 42950001 43350000 UP Local 60 Hs2230150001-230450000 230150001 230450000 UP Local 61 Hs8120750001-120950000 120750001 120950000 UP Local 62 Hs1369050001-69250000 69050001 69250000 UP Global 63 Hs17 58350001-5855000058350001 58550000 UP Local 64 Hs2 47750001-47850000 47750001 47850000DOWN Local 65 Hs12 44300001-44400000 44300001 44400000 UP Global 66 Hs12109500001-109600000 109500001 109600000 UP Local 67 Hs658450001-58550000 58450001 58550000 UP Local 68 Hs1 219250001-219650000219250001 219650000 UP Local 69 Hs12 128050001-128650000 128050001128650000 DOWN Local 70 Hs20 42500001-42700000 42500001 42700000 UPLocal 71 Hs2 133050001-133350000 133050001 133350000 DOWN Global 72 Hs786350001-86450000 86350001 86450000 UP Global 73 Hs4 186800001-187200000186800001 187200000 DOWN Local 74 Hs20 39950001-40150000 3995000140150000 DOWN Global 75 Hs1 88850001-89450000 88850001 89450000 UP Local76 Hs12 127950001-128650000 127950001 128650000 DOWN Local 77 Hs2133000001-133400000 133000001 133400000 DOWN Local 78 Hs786350001-86450000 86350001 86450000 UP Local 79 Hs2 230250001-230450000230250001 230450000 UP Local 80 Hs12 127950001-128650000 127950001128650000 DOWN Global 81 Hs12 44300001-44400000 44300001 44400000 UPLocal 82 Hs8 42950001-43350000 42950001 43350000 UP Global 83 Hs2186650001-186750000 186650001 186750000 UP Global 84 Hs1761300001-61800000 61300001 61800000 DOWN Global 85 Hs2133050001-133350000 133050001 133350000 DOWN Local 86 Hs12109500001-109600000 109500001 109600000 UP Global 87 Hs9114800001-114900000 114800001 114900000 UP Local 88 Hs658350001-58550000 58350001 58550000 UP Global 89 Hs2 234200001-234700000234200001 234700000 DOWN Local 90 Hs8 120750001-121050000 120750001121050000 UP Global 91 Hs6 58400001-58600000 58400001 58600000 UP Global92 Hs8 67600001-67700000 67600001 67700000 UP Global 93 Hs2235800001-236100000 235800001 236100000 DOWN Local 94 Hs769900001-70100000 69900001 70100000 DOWN Local 95 Hs12128050001-128650000 128050001 128650000 DOWN Global 96 Hs1675950001-76350000 75950001 76350000 DOWN Global 97 Hs2 98700001-9890000098700001 98900000 UP Global 98 Hs12 95400001-95600000 95400001 95600000UP Local 99 Hs2 20200001-20500000 20200001 20500000 DOWN Local 100 Hs1321100001-21800000 21100001 21800000 DOWN Local

TABLE 2 Example of BED-Files (one file per sample and chromosome). Chro-bin- bin- bases bin Q-bp mosome Start Stop #reads covered size coveredchr22 15150001 15250000 1 49 99999 0.00049 chr22 15200001 15300000 9 22599999 0.00225 chr22 15250001 15350000 21 548 99999 0.0054801 chr2215300001 15400000 16 511 99999 0.0051101 chr22 15350001 15450000 8 32999999 0.00329 chr22 15400001 15500000 10 337 99999 0.00337 chr2215450001 15550000 32 561 99999 0.0056101 chr22 15500001 15600000 38 75299999 0.0075201 chr22 15550001 15650000 40 1160 99999 0.0116001 chr2215600001 15700000 59 1499 99999 0.0149901 chr22 15650001 15750000 381000 99999 0.0100001 chr22 15700001 15800000 38 1046 99999 0.0104601chr22 15750001 15850000 52 1462 99999 0.0146201 chr22 15800001 1590000038 1119 99999 0.0111901 chr22 15850001 15950000 54 1338 99999 0.0133801chr22 15900001 16000000 55 1417 99999 0.0141701 chr22 15950001 1605000042 1162 99999 0.0116201 chr22 16000001 16100000 43 1250 99999 0.0125001chr22 16050001 16150000 46 1117 99999 0.0111701 chr22 16100001 1620000060 1319 99999 0.0131901 chr22 16150001 16250000 66 1664 99999 0.0166402chr22 16200001 16300000 99 2245 99999 0.0224502

TABLE 3 Results of the stepwise selection procedure of the 100 highestranking regions as in Table 1 Step- Step- Regions: AUC wise wise FINALAUC 0.926719238 # Out IN 27 (204|207) 0.922385621 Weight 1 1.5 1.5 Gchr843050001-43250000 1 1.5 2 1 0 Lchr15 61450001-61750000 0 0 3 0.49 0.49Gchr1 212200001-212400000 1 0.49 4 0.49 0 Lchr8 43150001-43350000 0 0 50.49 0 Lchr13 21200001-21800000 0 0 6 0.49 1 Lchr17 58300001-58500000 11.5 7 1 1 Gchr15 61500001-61900000 1 1 8 0.49 1 Lchr1148450001-148650000 1 0.49 9 0.49 0 Lchr17 58250001-58550000 0 0 10 0 0Lchr5 11200001-11600000 0 0 11 0 0 Gchr8 43150001-43250000 0 0 12 1 1.5Lchr10 17900001-18200000 1 1.5 13 0 0 Lchr15 61500001-61800000 0 0 14 11.5 Lchr9 88650001-88750000 1 1.5 15 0 1 Lchr15 61350001-61750000 0 0 160 1 Gchr9 88650001-88750000 0 0 17 0 0 Lchr1 88950001-89450000 0 0 18 00 Lchr2 132950001-133150000 0 0 19 0 0 Lchr4 41850001-42050000 0 0 20 11 Lchr10 89050001-89250000 1 1 21 0 0 Lchr8 43100001-43400000 0 0 22 1 0Gchr2 230150001-230450000 0 0 23 0 0 Lchr4 186750001-187250000 0 0 24 11 Lchr10 27600001-27900000 1 1 25 0 0 Gchr10 109750001-110050000 0 0 260 0 Gchr8 43100001-43400000 0 0 27 1 1 Lchr13 21250001-21750000 1 1 28 00 Lchr3 55450001-55650000 0 0 29 1 1 Lchr16 67750001-67950000 1 1 300.49 0 Lchr3 55400001-55700000 0 0 31 0 0 Gchr2 132950001-133250000 0 032 0 0 Lchr20 32450001-32550000 0 0 33 1 1 Lchr13 21300001-21800000 1 134 0 1 Lchr22 28650001-28850000 0 0 35 0.49 0 Lchr16 57450001-57750000 00 36 1 1 Lchr10 27650001-27850000 1 1 37 0 0 Lchr2 64850001-64950000 0 038 0 0 Lchr22 28600001-28900000 0 0 39 0 0 Lchr10 109750001-110050000 00 40 0 0 Lchr1 148550001-148750000 0 0 41 1 1 Lchr4 186600001-1871000001 1 42 0 1 Lchr22 30100001-30300000 0 0 43 0 1 Lchr1 88800001-89500000 00 44 0 1 Gchr20 16000001-16200000 0 0 45 0 0 Gchr20 40000001-40200000 00 46 1 1 Gchr6 58450001-58550000 1 1 47 1 0 Lchr11 10100001-10300000 0 048 1 1 Lchr20 57850001-58250000 1 1 49 0 0 Lchr17 54550001-54750000 0 050 0 0 Lchr10 32950001-33150000 0 0 51 0 0 Gchr16 45650001-45950000 0 052 1 1 Lchr20 42300001-42700000 1 1 53 0 1 Gchr10 17900001-18200000 0 054 1.5 1.5 Lchr13 18350001-18650000 1 1.5 55 1 0.49 Gchr247750001-47850000 1 0.49 56 0 0 Gchr2 64850001-64950000 0 0 57 0 0 Gchr1197450001-197750000 0 0 58 0 0 Gchr2 133000001-133400000 0 0 59 0 0Lchr8 42950001-43350000 0 0 60 0 0 Lchr2 230150001-230450000 0 0 61 1.51 Lchr8 120750001-120950000 1 1 62 0 0 Gchr13 69050001-69250000 0 0 63 00 Lchr17 58350001-58550000 0 0 64 0 0 Lchr2 47750001-47850000 0 0 650.49 1 Gchr12 44300001-44400000 1 1 66 1 1 Lchr12 109500001-109600000 11 67 0 0 Lchr6 58450001-58550000 0 0 68 0 0 Lchr1 219250001-219650000 00 69 1 0 Lchr12 128050001-128650000 0 0 70 0 0 Lchr20 42500001-427000000 0 71 0 0 Gchr2 133050001-133350000 0 0 72 0.49 1.5 Gchr786350001-86450000 1 1 73 0 0 Lchr4 186800001-187200000 0 0 74 0 0 Gchr2039950001-40150000 0 0 75 0 0 Lchr1 88850001-89450000 0 0 76 0 0 Lchr12127950001-128650000 0 0 77 0 0 Lchr2 133000001-133400000 0 0 78 0 0Lchr7 86350001-86450000 0 0 79 0 0 Lchr2 230250001-230450000 0 0 80 0 0Gchr12 127950001-128650000 0 0 81 0 0 Lchr12 44300001-44400000 0 0 82 01 Gchr8 42950001-43350000 0 0 83 1 1 Gchr2 186650001-186750000 1 1 841.5 1.5 Gchr17 61300001-61800000 1 1 85 0 0 Lchr2 133050001-133350000 00 86 0 0.49 Gchr12 109500001-109600000 0 0 87 0 0 Lchr9114800001-114900000 0 0 88 0 0 Gchr6 58350001-58550000 0 0 89 0 0 Lchr2234200001-234700000 0 0 90 0 0 Gchr8 120750001-121050000 0 0 91 0 0Gchr6 58400001-58600000 0 0 92 0 0 Gchr8 67600001-67700000 0 0 93 0 0Lchr2 235800001-236100000 0 0 94 0 0 Lchr7 69900001-70100000 0 0 95 0 1Gchr12 128050001-128650000 0 0 96 0 0 Gchr16 75950001-76350000 0 0 97 11.5 Gchr2 98700001-98900000 1 1 98 1 1.5 Lchr12 95400001-95600000 1 1 990 0 Lchr2 20200001-20500000 0 0 100 0 0 Lchr13 21100001-21800000 0 0

TABLE 4 Final Selected Regions. # Region Start-Stop Up/Down Weight 1Gchr8 43050001-43250000 UP 1.5 3 Gchr1 212200001-212400000 DOWN 0.49 6Lchr17 58300001-58500000 UP 1.5 7 Gchr15 61500001-61900000 UP 1 8 Lchr1148450001-148650000 UP 0.49 12 Lchr10 17900001-18200000 UP 1.5 14 Lchr988650001-88750000 UP 1.5 20 Lchr10 89050001-89250000 UP 1 24 Lchr1027600001-27900000 UP 1 27 Lchr13 21250001-21750000 DOWN 1 29 Lchr1667750001-67950000 UP 1 33 Lchr13 21300001-21800000 DOWN 1 36 Lchr1027650001-27850000 UP 1 41 Lchr4 186600001-187100000 DOWN 1 46 Gchr658450001-58550000 UP 1 48 Lchr20 57850001-58250000 UP 1 52 Lchr2042300001-42700000 UP 1 54 Lchr13 18350001-18650000 DOWN 1.5 55 Gchr247750001-47850000 DOWN 0.49 61 Lchr8 120750001-120950000 UP 1 65 Gchr1244300001-44400000 UP 1 66 Lchr12 109500001-109600000 UP 1 72 Gchr786350001-86450000 UP 1 83 Gchr2 186650001-186750000 UP 1 84 Gchr1761300001-61800000 DOWN 1 97 Gchr2 98700001-98900000 UP 1 98 Lchr1295400001-95600000 UP 1 Gchr = global normalization/Lchr = localnormalization

TABLE 5 Cross-Correlation Table. #deviant calls Region1 Region2 0 55 640 66 86 1 65 81 2 37 56 2 71 85 3 12 53 4 46 67 4 69 95 5 58 77 5 59 825 72 78 6 14 16 6 76 80 7 21 26 8 4 82 8 25 39 9 4 59 11 4 11 11 4 21 121 82 12 21 59 12 69 76 12 76 95 14 43 75 14 69 80 14 80 95 15 1 11 15 159

TABLE 6 Performance Analysis Sample set size (#PrCa|CNTRLS) PrCa CNTRLSAUC CI (95%) 204|207  All Normals 0.927 0.902-0.951 89|207 Gleason < 7Normals 0.954 0.929-0.978 84|207 Gleason ≥ 7 Normals 0.913 0.878-0.949204|227  All All + OMC 0.927 0.902-0.951 89|227 Gleason < 7 All + OMC0.954 0.929-0.978 84|227 Gleason ≥ 7 All + OMC 0.913 0.877-0.948192|201  41 ≤ Age ≤ 81* 41 ≤ 0.920 0.893-0.946 Age ≤ 81* 76|174 41 ≤ Age≤ 65 41 ≤ 0.938 0.911-0.966 Age ≤ 65 113|118  Sequence size: Sequence0.907 0.871-0.943 40mer size: 40mer 91|109 Sequence size: Sequence 0.9480.915-0.980 50mer size: 50mer *Age range between youngest PrCa(41) andoldest control sample (81).

What is claimed is:
 1. A method of analyzing circulating free DNA in a patient sample, comprising measuring, in a sample that is blood, serum or plasma, the level of, a first cell-free DNA having a sequence at least 25 nucleotide in length unambiguously assigned to a first chromosomal region set forth in Table 1, and a second cell-free DNA having a sequence at least 25 nucleotide in length unambiguously assigned to a second chromosomal region set forth in Table 1, wherein the sequences of said first and second cell-free DNAs are free of repetitive elements.
 2. The method of claim 1, wherein said patient has or is suspected of having prostate cancer.
 3. The method of claim 1, further comprising measuring in said sample the level of a third cell free DNA having a sequence at least 25 nucleotides in length unambiguously assigned to a third chromosomal region set forth in Table 1, wherein said third chromosomal region is different from said first and second chromosomal regions, and the sequence of said third cell free DNA is free of repetitive elements.
 4. The method of claim 1, further comprising measuring in said sample at least 5, 8, 10, 20, 30, 40, 50, 60, 70, 75, 80, 85, or 90 additional different cell free DNAs, each having a sequence at least 25 nucleotides in length and free of repetitive elements, wherein each sequence is unambiguously assigned to a different chromosomal region set forth in Table
 1. 5. The method of claim 1, further comprising measuring the level of all of the cell-free DNAs in the sample that have a sequence at least 25 nucleotides in length unambiguously assigned to a chromosomal region listed in Table
 1. 6. The method of claim 1, further comprising effecting a cancer therapy.
 7. A kit comprising: a first plurality of oligonucleotides wherein each oligonucleotide within each of said plurality has a nucleotide sequence falling within the same first chromosomal region set forth in Table 1; and a second plurality of oligonucleotides each having a nucleotide sequence falling within the same second chromosomal region set forth in Table 1, wherein said first and second chromosomal regions are different and wherein said oligonucleotides are free of repetitive element.
 8. The kit according to claim 7, wherein the first chromosomal region and the second chromosomal region are set forth in Table
 4. 9. The kit according to claim 7, wherein said oligonucleotides are attached to a solid substrate.
 10. A method of analyzing DNA in a patient sample, comprising: preparing a sequencing library of circulating cell-free DNA by performing whole genome amplification on cell-free DNA isolated from the blood, serum or a plasma sample from a patient; sequencing DNA from the sequencing library; unambiguously assigning the sequences to a region of the human genome to identify genomic windows that represent the regions of the genome that comprise the sequences; determining genomic windows for which the number of reads significantly differ from normal controls.
 11. The method of claim 11, wherein the number of windows that deviate from normal is determined and compared to normal controls.
 12. The method of claim 11, wherein the sum of reads in one or more windows is determined and compared to normal.
 13. The method of claim 11, wherein the patient has prostate cancer. 